BBVA Success - Definitions of Success

Magdalena Bennett1

William Fuchs2

Jaime Millán3

October 3rd, 2024

Edits

  • 09/17/24:
    • Changed q634 and q635 for the ratio, and took out the original variables (including a missing dummy).
    • Took out “Business situation” from predicting models.
    • Tested taking out “cluster” variable for entire sample prediction/ taking out definition of success variables.
    • Added a love plot beside varImp plots.
  • 09/18/24:
    • Changed max and min income to a ratio with respect to assets in their first loan.
    • Fix 5 obs that had missing ratio (both q634 and q635 were 0).
    • Added preliminary code for selecting 10 variables.
  • 09/26/24:
    • Imputed missing survey age (148) with admin data
    • Changed max and min income to logarithms (to treat outliers) and avoid high means for standardization.
    • Checked ratio of resources needed and increase of income (transformed to logs)
  • 10/03/24:
    • Changed max and min income to percentile (1-100).
    • Changed ratio of investment and income growth to percentile (1-100), and also added the amount of the investment as a variable.
    • Included cluster variable in the model selection.
  • 10/14/24:
    • Multiple iterations of RFE for variable selection (05b_rfe_for_variable_selection.R)
    • Comparison between average business and business predicted as successful.

Definition of Success

In this section, we analyze the characteristics of clients according to their definition of success. We will use the following questions to build definitions of success in a data-driven way:

  1. Why do you consider yourself a successful person? (Open)

  2. For you, a successful person is mainly someone who:

  • Is recognized by the community.
  • Has stability at home.
  • Has stability in their business.
  1. For you, a successful business is if:
  • It has managed to grow.
  • It has manages to maintain itself.
  • It has generated employment.
  • It has generated a positive change in the community.
  • It allows me to cover household necessities.
  1. Imagine that everything goes well, what would be the size of your business in 5 years?
  • Smaller/the same as now.
  • Larger than now.
  • Much larger than now.
  1. Imagine that everything goes well, how many people would work in your business in 5 years? (% growth)

  2. If the opportunity presented itself to work in a company with a permanent company, what would you do?

  • Sell my current business/ Close my current business if I can’t sell it.
  • Keep my current business (in parallel or managed by another person).
  • Wouldn’t accept it.
  1. What amount of money would you accept for a job? (1,000 US$)

  2. You would like that your children:

  • Were entrepreneurs and had their own business.
  • Managed the family business.
  • Had a secure job.

Defining Success

Using the previous questions, we perform a hierarchical clustering analysis (HCA) to obtain optimal clusters based on the customers’ responses. Based on the responses, we create two different clusters to assess differences in definitions of success. To increase the disparity between both groups, we exclude observations that are in the 10% closer to the other cluster’s center, dropping 184 out of 1833 total observations.

The following table shows the comparison between both clusters in terms of the previous data:

Table 1. Balance table across clients’ responses according to clusters
Cluster 1 (N=1188)
Cluster 2 (N=461)
Mean Std. Dev. Mean Std. Dev. Diff. in Means p
What makes you successful? - business 0.121 0.327 0.158 0.365 0.037 0.057
What makes you successful? - family 0.024 0.152 0.000 0.000 -0.024 0.000
A successful person: Is recognized by the community 0.200 0.400 0.007 0.080 -0.194 0.000
A successful person: Has stability in her home 0.383 0.486 0.449 0.498 0.066 0.015
A successful person: Has stability in her business 0.417 0.493 0.544 0.499 0.128 0.000
A successful business: Has managed to grow 0.355 0.479 0.360 0.481 0.005 0.853
A successful business: Has managed to sustain 0.216 0.412 0.260 0.439 0.044 0.064
A successful business: Has generated employment 0.136 0.343 0.174 0.379 0.037 0.067
A successful business: Has brought about a positive change in the community 0.102 0.303 0.000 0.000 -0.102 0.000
A successful business: Covers the household needs 0.190 0.393 0.206 0.405 0.016 0.472
Business would be much larger 0.608 0.488 0.577 0.495 -0.031 0.256
Business would be larger 0.000 0.000 0.000 0.000 0.000
Business would be equal/smaller 0.392 0.488 0.423 0.495 0.031 0.256
How many workers would you have? (prop. over current workers) 3.421 3.515 3.579 4.152 0.157 0.472
If offered a job: Accept and keep 0.788 0.409 0.061 0.239 -0.727 0.000
If offered a job: Close or sell 0.000 0.000 0.000 0.000 0.000
If offered a job: Not accept 0.212 0.409 0.939 0.239 0.727 0.000
What amount of money would a job have to pay? (1,000 US$) 5.187 5.439 11.976 0.535 6.789 0.000
What amount of money would a job have to pay? (No amount) 0.385 0.487 0.998 0.047 0.613 0.000
You would like your children: Were entrepreneurs and had their own business 0.573 0.495 0.644 0.479 0.071 0.008
You would like your children: Took charge of the family business (their business) 0.238 0.426 0.356 0.479 0.118 0.000
You would like your children: Had a secure job 0.189 0.391 0.000 0.000 -0.189 0.000

The following plot shows the relative importance of the previous variables in terms of the difference between clusters measured in standard deviations. Negative differences imply a higher value for cluster 2 and positive differences imply a higher value for cluster 1.

Figure 1. Variable Importance in Cluster Formation

Figure 1. Variable Importance in Cluster Formation

Characteristics by cluster

In this section, we compare different characteristics by clusters, as previously defined. We can see that clients in Cluster 2 have better loan characteristics and business performance, but they have otherwise similar characteristics in terms of risk and loss aversion and investment decisions. Surprisingly, individuals in Cluster 2 have lower education compared to Cluster 1, and are more likely to be the main providers for their households.

Table 2. Balance table across clients’ characteristics according to clusters - Loan characteristics and Business Performance
Cluster 1 (N=1188) / Mean Cluster 2 (N=461) / Mean Diff. in Means
1st loan amount -0.027 0.189 0.216
Surplus on 1st loan (SD) -0.014 0.172 0.186
Surplus on last loan (SD) 0.007 0.174 0.167
Expected number of workers 6.843 8.197 1.354
Regarding the first year of activity of your business, the annual sales of your business are - They are much higher 0.228 0.315 0.086
Regarding the first year of activity of your business, the annual sales of your business are - They are higher, but not by much. 0.439 0.430 -0.009
Regarding the first year of activity of your business, the annual sales of your business are - They are the same. 0.115 0.080 -0.035
Regarding the first year of activity of your business, the annual sales of your business are - They are lower. 0.169 0.143 -0.026
Regarding the first year of activity of your business, the annual sales of your business are - I’ve had my business for less than a year 0.012 0.007 -0.005
Note:
Cells highlighted according to whether they are statistically different at conventional levels
Table 3. Balance table across clients’ characteristics according to clusters - Investment decisions
Cluster 1 (N=1188) / Mean Cluster 2 (N=461) / Mean Diff. in Means
If you received $4,000, you would use that money for growing your business 0.812 0.826 0.014
When you have gains from your business, you usually invest in growing your business 0.532 0.562 0.030
Note:
Cells highlighted according to whether they are statistically different at conventional levels
Table 4. Balance table across clients’ characteristics according to clusters - Attitudes
Cluster 1 (N=1188) / Mean Cluster 2 (N=461) / Mean Diff. in Means
Loss aversion (-) 3.074 3.239 0.165
Risk aversion (-) 2.253 2.174 -0.079
Dishonesty (proxy) 1.253 1.215 -0.039
If you get presented with an opportunity, you think of: In the potential to increase their income. 0.648 0.677 0.029
If you get presented with an opportunity, you think of: In the possible risks. 0.109 0.100 -0.010
With which phrase do you identify the most? Nothing ventured, nothing gained 0.497 0.464 -0.032
With which phrase do you identify the most? A bird in the hand is worth two in the bush 0.112 0.167 0.055
Note:
Cells highlighted according to whether they are statistically different at conventional levels
Table 5. Balance table across clients’ characteristics according to clusters - Demographic characteristics
Cluster 1 (N=1188) / Mean Cluster 2 (N=461) / Mean Diff. in Means
Gender - Female 0.572 0.570 -0.001
Secondary education or more - Yes 0.529 0.449 -0.080
Household income - Contribute more than 50% 0.488 0.549 0.061
Uses WhatsApp Yes 0.852 0.866 0.014
Age 45.960 48.707 2.748
Colombia 0.229 0.299 0.070
Dominican Republic 0.234 0.302 0.068
Panama 0.217 0.165 -0.052
Peru 0.320 0.234 -0.086
Note:
Cells highlighted according to whether they are statistically different at conventional levels

Balancing Business Performance

We are also interested in analyzing whether the definitions of success for each individual are mainly mediated by the performance of their own business. For that reason, we match observations in Cluster 2 to Cluster 1 to balance the means of business’ performance with a tolerance of 0.025 SD. We are able to match 100% of the observations in Cluster 2, obtaining the following pre- and post-matching differences:

Table 7. Balance of business performance before and after matching
Pre-matching
Post-matching
Mean Cluster 2 Mean Cluster 1 Diff (SD) Mean Cluster 2 Mean Cluster 1 Diff (SD)
Surplus - first loan 0.172 -0.014 0.187*** 0.172 0.15 0.02
Surplus - first loan (missing) 0.022 0.031 -0.059 0.022 0.024 -0.015
Surplus - last loan 0.174 0.007 0.164*** 0.174 0.153 0.02
Surplus - last loan (missing) 0.069 0.063 0.025 0.069 0.065 0.017
Amount - first loan 0.189 -0.027 0.194*** 0.189 0.166 0.019
Amount - first loan (missing) 0.004 0.011 -0.076 0.004 0.004 0
Amount - last loan 0.194 0.005 0.176*** 0.194 0.177 0.015
Amount - last loan (missing) 0.004 0.011 -0.076 0.004 0.004 0
Assets - first loan 0.062 -0.021 0.089 0.062 0.059 0.003
Assets - first loan (missing) 0.208 0.275 -0.157*** 0.208 0.219 -0.026
Assets - last loan 0.128 -0.031 0.156*** 0.128 0.109 0.018
Assets - last loan (missing) 0.035 0.061 -0.125** 0.035 0.03 0.024

Analyzing now the differences between the matched observations, we can see that there are still significant differences between Cluster 1 and Cluster 2 in terms of their perspectives on success, even after adjusting by business performance:

Table 6. Balance table across clients’ responses according to clusters - After matching
Cluster 1 (N=461)
Cluster 2 (N=461)
Mean Std. Dev. Mean Std. Dev. Diff. in Means p
What makes you successful? - business 0.119 0.325 0.158 0.365 0.039 0.087
What makes you successful? - family 0.020 0.139 0.000 0.000 -0.020 0.003
A successful person: Is recognized by the community 0.200 0.400 0.007 0.080 -0.193 0.000
A successful person: Has stability in her home 0.395 0.489 0.449 0.498 0.054 0.096
A successful person: Has stability in her business 0.406 0.492 0.544 0.499 0.139 0.000
A successful business: Has managed to grow 0.349 0.477 0.360 0.481 0.011 0.731
A successful business: Has managed to sustain 0.228 0.420 0.260 0.439 0.033 0.251
A successful business: Has generated employment 0.132 0.339 0.174 0.379 0.041 0.082
A successful business: Has brought about a positive change in the community 0.119 0.325 0.000 0.000 -0.119 0.000
A successful business: Covers the household needs 0.171 0.377 0.206 0.405 0.035 0.178
Business would be much larger 0.620 0.486 0.577 0.495 -0.043 0.179
Business would be equal/smaller 0.380 0.486 0.423 0.495 0.043 0.179
How many workers would you have? (prop. over current workers) 3.445 3.644 3.579 4.152 0.134 0.603
If offered a job: Accept and keep 0.790 0.408 0.061 0.239 -0.729 0.000
If offered a job: Not accept 0.210 0.408 0.939 0.239 0.729 0.000
What amount of money would a job have to pay? (1,000 US$) 5.388 5.480 11.976 0.535 6.588 0.000
What amount of money would a job have to pay? (No amount) 0.403 0.491 0.998 0.047 0.594 0.000
You would like your children: Were entrepreneurs and had their own business 0.577 0.495 0.644 0.479 0.067 0.036
You would like your children: Took charge of the family business (their business) 0.241 0.428 0.356 0.479 0.115 0.000
You would like your children: Had a secure job 0.182 0.386 0.000 0.000 -0.182 0.000
Table 8. Balance table between Cluster 1 before and after matching
Original (N=1188)
Matched (N=461)
Mean Std. Dev. Mean Std. Dev. Diff. in Means p
What makes you successful? - business 0.121 0.327 0.119 0.325 -0.002 0.915
What makes you successful? - family 0.024 0.152 0.020 0.139 -0.004 0.605
A successful person: Is recognized by the community 0.200 0.400 0.200 0.400 -0.001 0.972
A successful person: Has stability in her home 0.383 0.486 0.395 0.489 0.012 0.660
A successful person: Has stability in her business 0.417 0.493 0.406 0.492 -0.011 0.683
A successful business: Has managed to grow 0.355 0.479 0.349 0.477 -0.006 0.820
A successful business: Has managed to sustain 0.216 0.412 0.228 0.420 0.011 0.618
A successful business: Has generated employment 0.136 0.343 0.132 0.339 -0.004 0.829
A successful business: Has brought about a positive change in the community 0.102 0.303 0.119 0.325 0.017 0.318
A successful business: Covers the household needs 0.190 0.393 0.171 0.377 -0.019 0.368
Business would be much larger 0.608 0.488 0.620 0.486 0.013 0.636
Business would be equal/smaller 0.392 0.488 0.380 0.486 -0.013 0.636
How many workers would you have? (prop. over current workers) 3.421 3.515 3.445 3.644 0.023 0.906
If offered a job: Accept and keep 0.788 0.409 0.790 0.408 0.002 0.939
If offered a job: Not accept 0.212 0.409 0.210 0.408 -0.002 0.939
What amount of money would a job have to pay? (1,000 US$) 5.187 5.439 5.388 5.480 0.201 0.503
What amount of money would a job have to pay? (No amount) 0.385 0.487 0.403 0.491 0.019 0.485
You would like your children: Were entrepreneurs and had their own business 0.573 0.495 0.577 0.495 0.004 0.889
You would like your children: Took charge of the family business (their business) 0.238 0.426 0.241 0.428 0.003 0.913
You would like your children: Had a secure job 0.189 0.391 0.182 0.386 -0.006 0.766

Analyzing potential differences in other outcomes, we obtain the following results after matching:

Table 9. Difference in characteristics between Cluster 1 and Cluster 2 - After Matching
Cluster 1 (N=461)
Cluster 2 (N=461)
Mean Std. Dev. Mean Std. Dev. Diff. in Means p
If you received $4,000, you would use that money for growing your business 0.798 0.402 0.826 0.379 0.028 0.273
When you have gains from your business, you usually invest in growing your business 0.518 0.500 0.562 0.497 0.043 0.187
What size would your business be in 5 years? - Much larger/larger 0.620 0.486 0.577 0.495 -0.043 0.179
Expected number of workers 6.833 7.911 8.197 16.473 1.364 0.109
Loss aversion (-) 3.102 2.105 3.239 2.123 0.137 0.327
Risk aversion (-) 2.215 1.881 2.174 1.985 -0.041 0.746
Dishonesty (proxy) 1.154 1.078 1.215 1.065 0.061 0.390
If you get presented with an opportunity, you think of: In the potential to increase their income. 0.625 0.485 0.677 0.468 0.052 0.098
If you get presented with an opportunity, you think of: In the possible risks. 0.121 0.327 0.100 0.300 -0.022 0.294
With which phrase do you identify the most? Nothing ventured, nothing gained 0.453 0.498 0.464 0.499 0.011 0.741
With which phrase do you identify the most? A bird in the hand is worth two in the bush 0.121 0.327 0.167 0.373 0.046 0.049
Regarding the first year of activity of your business, the annual sales of your business are - They are much higher 0.247 0.432 0.315 0.465 0.067 0.023
Regarding the first year of activity of your business, the annual sales of your business are - They are higher, but not by much. 0.432 0.496 0.430 0.496 -0.002 0.947
Regarding the first year of activity of your business, the annual sales of your business are - They are the same. 0.089 0.285 0.080 0.272 -0.009 0.636
Regarding the first year of activity of your business, the annual sales of your business are - They are lower. 0.182 0.386 0.143 0.351 -0.039 0.108
Regarding the first year of activity of your business, the annual sales of your business are - I’ve had my business for less than a year 0.004 0.066 0.007 0.080 0.002 0.654
Household income - Contribute more than 50% 0.497 0.501 0.549 0.498 0.052 0.114
Gender - Female 0.570 0.496 0.570 0.496 0.000 1.000
Uses WhatsApp Yes 0.861 0.346 0.866 0.342 0.004 0.848
Secondary education or more - Yes 0.505 0.501 0.449 0.498 -0.056 0.087

Prediction of Success

We are interested in using the information available to see which characteristics would be good predictors of success, especially depending on the definition of success the individual has. For this, we will use two different definitions of success: (i) Loan growth and timely payments, and (ii) Success score.

The first measure, which was used for stratification purposes for sampling our survey population, identifies clients who have had growth in terms of loan amount (between their first and last registered loan), as well as overall timely payments. The second measure identifies micro-entrepreneurs that belong to the top quartile of a success score built on their (standardized) growth in terms of assets and loan amount.

Measure 1: Loan growth and timely payments

For the entire sample of individuals that currently own a business and that had more than one loan, we observe that 51.9% is classified as successful (out of individuals). In terms of this measure of success based on their previous definition of success (community/family vs business oriented), we can see that this is balanced between groups, with a slight advantage for Cluster 2 (business oriented) in terms of success.

For predicting the success for the entire sample (only including individuals that currently have a business and that had more than one loan), we use xgboost as the prediction model. We find that accuracy is low, with an average accuracy rate of 54.8%.

Figure 2. Variable importance for success variable prediction - Entire sample

Figure 2. Variable importance for success variable prediction - Entire sample

When we look at the different groups by their definition of success, we get an accuracy rate of 47.9% for Cluster 1.

Figure 3. Variable importance for success variable prediction - Community/family oriented group (Cluster 1)

Figure 3. Variable importance for success variable prediction - Community/family oriented group (Cluster 1)

When we look at the different groups by their definition of success, we get an accuracy rate of 48.3% for Cluster 2.

Figure 4. Variable importance for success variable prediction - Business oriented group (Cluster 2)

Figure 4. Variable importance for success variable prediction - Business oriented group (Cluster 2)

Measure 2: Success score

We define success score as:

\[Score \ Success_i = \frac{1}{3}(\Delta Loan Amount_i + \Delta Assets_i + \Delta Surplus_i)\]

where \(\Delta X\) represents the change between the last available loan and the first one. All variables are standardized, so they are comparable. If an individual is missing one of these variables, then the weight is re-distributed between the available data. In this case, we are predicting the probability of being in the top 25% of this score. In terms of our previous definitions of success, individuals that have a business-oriented definition of success are more likely to be in the top 25% (30.7%) compared to the community/family-oriented cluster (23%).

Using the same xgboost method as for the previous measure, we have an accuracy in prediction of 74.3% for the entire sample.

Figure 5. Variable importance for Top 25% of Success Score prediction - Entire sample

Figure 5. Variable importance for Top 25% of Success Score prediction - Entire sample

Figure 6. ROC - Entire Sample

Figure 6. ROC - Entire Sample

We can also analyze the partial change in probabilities for some of the predictors that are deemed some of the most important in the data:

Figure 7. Partial dependence between Top 25% Success and top predictors of full model

Figure 7. Partial dependence between Top 25% Success and top predictors of full model

Figure 8. Partial dependence between Top 25% Success and top predictors of full model for Growth and Investment

For the group that has a community/family vision of success, we have an accuracy in prediction of 73.5%.

Figure 9. Variable importance for Top 25% of Success Score prediction - Community/Family oriented group (Cluster 1)

Figure 9. Variable importance for Top 25% of Success Score prediction - Community/Family oriented group (Cluster 1)

Figure 10. ROC - Community/Family oriented group (Cluster 1)

Figure 10. ROC - Community/Family oriented group (Cluster 1)

For the group that has a business-like definition of success, we have an accuracy in prediction of 66.7%.

Figure 11. Variable importance for Top 25% of Success Score prediction - Business oriented group (Cluster 2)

Figure 11. Variable importance for Top 25% of Success Score prediction - Business oriented group (Cluster 2)

Figure 12. ROC - Business oriented group (Cluster 2)

Figure 12. ROC - Business oriented group (Cluster 2)

Variable selection

We would like to select the most predictive covariates in our survey (up to 15 features) to reduce the dimensionality of the instrument. For this, we use Recursive Feature Elimination in combination with Random Forest to select models with 5, 7, 10, and 15 features. Each model is fitted independitly (so they are not necessarily nested), and for 10 iterations to find the top variables. In this case, the selected model has 7 variables, with an accuracy of 74.4% in the testing data. The selected features are the following:

Table 10. Selected variables through Recursive Feature Elimination
Variables
If everything goes well, what’d be your max monthly income? (Pct)
How much would you have to invest to increase business income? (Pct)
What was the focus of that other loan? - Other investment
You are invited and attend events
Ratio b/w business growth and business investment (desired) (Pct)
Loss aversion (-)
Could you repay a loan of SOL$30k today? Yes

We can also plot the partial dependence between the outcome variable and the top 4 features. This would give us an idea of how the probability of being assigned to each class changes through the range of values of the predictor, holding other variables constant.

Figure 13. Partial dependence between Top 25% Success Score and most relevant predictors

Figure 13. Partial dependence between Top 25% Success Score and most relevant predictors


  1. University of Texas at Austin, ↩︎

  2. University of Texas at Austin, ↩︎

  3. Universidad de Navarra, jmillan@…↩︎